Utilizing interband acoustical information for modeling stationary time-frequency regions of noisy speech
نویسنده
چکیده
A novel enhancement system is developed that exploits the properties of stationary regions localized in both time and frequency. This system selects stationary time-frequency (TF) regions and adaptively enhances each region according to its local signal-tonoise ratio (LSNR) while utilizing both the acoustical knowledge of speech and the masking properties of the human auditory system. Each region is enhanced for maximum noise reduction while minimizing distortion. This paper evaluates the proposed system through informal listening tests and some objective measures.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملA glimpsing model of speech perception in noise.
Do listeners process noisy speech by taking advantage of "glimpses"-spectrotemporal regions in which the target signal is least affected by the background? This study used an automatic speech recognition system, adapted for use with partially specified inputs, to identify consonants in noise. Twelve masking conditions were chosen to create a range of glimpse sizes. Several different glimpsing m...
متن کاملStochastic perceptual models of speech
We have recently developed a statistical model of speech that avoids a number of current constraining assumptions for statistical speech recognition systems, particularly the model of speech as a sequence of stationary segments consisting of uncorrelated acoustic vectors. We further wish to focus statistical modeling power on perceptually-dominant and information-rich portions of the speech sig...
متن کاملAn evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech.
Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate in cases where noisy speech is processed by a time-frequency weighting. To this end, an extensive evaluation is presented of objective measure for intelligibility prediction of noisy speech processed with a technique called ideal time frequency (...
متن کاملTemporal resolution analysis in frequency domain linear prediction.
Frequency domain linear prediction (FDLP) is a technique for auto-regressive modeling of Hilbert envelopes. In this letter, the resolution properties of the FDLP model are investigated using synthetic signals with impulses immersed in noise. The effect of various factors are studied which affect the temporal resolution and this analysis suggests ways to improve the resolution of the FDLP envelo...
متن کامل